Scaling Private Record Linkage using Output Constrained Differential Privacy

نویسندگان

  • Xi He
  • Ashwin Machanavajjhala
  • Cheryl J. Flynn
  • Divesh Srivastava
چکیده

Many scenarios require computing the join of databases held by two or more parties that do not trust one another. Private record linkage is a cryptographic tool that allows such a join to be computed without leaking any information about records that do not participate in the join output. However, such strong security comes with a cost: except for exact equi-joins, these techniques have a high computational cost. While blocking has been used to successfully scale nonprivate record linkage trading off efficiency with output accuracy, we show that prior techniques that use blocking, even based on differentially private algorithms, result in leakage of non-matching records. In this paper, we seek methods for speeding up private record linkage with strong provable privacy guarantees. We propose a novel privacy model, called output constrained differential privacy, that shares the strong privacy protection of differential privacy, but allows for the truthful release of the output of a certain function on the data. We apply this to private record linkage, and show that algorithms satisfying this privacy model permit the disclosure of the true matching records, but their execution is insensitive to the presence or absence of a single non-matching record. We show that prior attempts to speed up record linkage do not even satisfy our privacy model. We develop novel algorithms for private record linkage that satisfy our privacy model, and explore the resulting 3-way tradeoff between privacy, efficiency and output accuracy using experiments on real datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Privacy Preserving Record Linkage via grams Projections

Record linkage has been extensively used in various data mining applications involving sharing data. While the amount of available data is growing, the concern of disclosing sensitive information poses the problem of utility vs privacy. In this paper, we study the problem of private record linkage via secure data transformations. In contrast to the existing techniques in this area, we propose a...

متن کامل

DPCube: Differentially Private Histogram Release through Multidimensional Partitioning

Differential privacy is a strong notion for protecting individual privacy in privacy preserving data analysis or publishing. In this paper, we study the problem of differentially private histogram release for random workloads. We study two multidimensional partitioning strategies including: 1) a baseline cell-based partitioning strategy for releasing an equi-width cell histogram, and 2) an inno...

متن کامل

Secure Blocking + Secure Matching = Secure Record Linkage

Performing approximate data matching has always been an intriguing problem for both industry and academia. This task becomes even more challenging when the requirement of data privacy rises. In this paper, we propose a novel technique to address the problem of efficient privacy-preserving approximate record linkage. The secure framework we propose consists of two basic components. First, we uti...

متن کامل

Privacy-Preserving Record Linkage

Record linkage has a long tradition in both the statistical and the computer science literature. We survey current approaches to the record linkage problem in a privacy-aware setting and contrast these with the more traditional literature. We also identify several important open questions that pertain to private record linkage from different per-

متن کامل

Preserving Privacy for Interesting Location Pattern Mining from Trajectory Data

One main concern for individuals participating in the data collection of personal location history records (i.e., trajectories) is the disclosure of their location and related information when a user queries for statistical or pattern mining results such as frequent locations derived from these records. In this paper, we investigate how one can achieve the privacy goal that the inclusion of his...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1702.00535  شماره 

صفحات  -

تاریخ انتشار 2016